49 research outputs found
SuperSpike: Supervised learning in multi-layer spiking neural networks
A vast majority of computation in the brain is performed by spiking neural
networks. Despite the ubiquity of such spiking, we currently lack an
understanding of how biological spiking neural circuits learn and compute
in-vivo, as well as how we can instantiate such capabilities in artificial
spiking circuits in-silico. Here we revisit the problem of supervised learning
in temporally coding multi-layer spiking neural networks. First, by using a
surrogate gradient approach, we derive SuperSpike, a nonlinear voltage-based
three factor learning rule capable of training multi-layer networks of
deterministic integrate-and-fire neurons to perform nonlinear computations on
spatiotemporal spike patterns. Second, inspired by recent results on feedback
alignment, we compare the performance of our learning rule under different
credit assignment strategies for propagating output errors to hidden units.
Specifically, we test uniform, symmetric and random feedback, finding that
simpler tasks can be solved with any type of feedback, while more complex tasks
require symmetric feedback. In summary, our results open the door to obtaining
a better scientific understanding of learning and computation in spiking neural
networks by advancing our ability to train them to solve nonlinear problems
involving transformations between different spatiotemporal spike-time patterns
Improving equilibrium propagation without weight symmetry through Jacobian homeostasis
Equilibrium propagation (EP) is a compelling alternative to the
backpropagation of error algorithm (BP) for computing gradients of neural
networks on biological or analog neuromorphic substrates. Still, the algorithm
requires weight symmetry and infinitesimal equilibrium perturbations, i.e.,
nudges, to estimate unbiased gradients efficiently. Both requirements are
challenging to implement in physical systems. Yet, whether and how weight
asymmetry affects its applicability is unknown because, in practice, it may be
masked by biases introduced through the finite nudge. To address this question,
we study generalized EP, which can be formulated without weight symmetry, and
analytically isolate the two sources of bias. For complex-differentiable
non-symmetric networks, we show that the finite nudge does not pose a problem,
as exact derivatives can still be estimated via a Cauchy integral. In contrast,
weight asymmetry introduces bias resulting in low task performance due to poor
alignment of EP's neuronal error vectors compared to BP. To mitigate this
issue, we present a new homeostatic objective that directly penalizes
functional asymmetries of the Jacobian at the network's fixed point. This
homeostatic objective dramatically improves the network's ability to solve
complex tasks such as ImageNet 32x32. Our results lay the theoretical
groundwork for studying and mitigating the adverse effects of imperfections of
physical networks on learning algorithms that rely on the substrate's
relaxation dynamics
The remarkable robustness of surrogate gradient learning for instilling complex function in spiking neural networks
Brains process information in spiking neural networks. Their intricate connections shape the diverse functions these networks perform. In comparison, the functional capabilities of models of spiking networks are still rudimentary. This shortcoming is mainly due to the lack of insight and practical algorithms to construct the necessary connectivity. Any such algorithm typically attempts to build networks by iteratively reducing the error compared to a desired output. But assigning credit to hidden units in multi-layered spiking networks has remained challenging due to the non-differentiable nonlinearity of spikes. To avoid this issue, one can employ surrogate gradients to discover the required connectivity in spiking network models. However, the choice of a surrogate is not unique, raising the question of how its implementation influences the effectiveness of the method. Here, we use numerical simulations to systematically study how essential design parameters of surrogate gradients impact learning performance on a range of classification problems. We show that surrogate gradient learning is robust to different shapes of underlying surrogate derivatives, but the choice of the derivative’s scale can substantially affect learning performance. When we combine surrogate gradients with a suitable activity regularization technique, robust information processing can be achieved in spiking networks even at the sparse activity limit. Our study provides a systematic account of the remarkable robustness of surrogate gradient learning and serves as a practical guide to model functional spiking neural networks
Memory formation and recall in recurrent spiking neural networks
Our brain has the capacity to analyze a visual scene in a split second, to learn how to play an instrument, and to remember events, faces and concepts. Neurons underlie all of these diverse functions. Neurons, cells within the brain that generate and transmit electrical activity, communicate with each other through chemical synapses. These synaptic connections dynamically change with experience, a process referred to as synaptic plasticity, which is thought to be at the core of the brain's ability to learn and process the world in sophisticated ways. Our understanding of the rules of synaptic plasticity remains quite limited. To enable efficient computations among neurons or to serve as a trace of memory, synapses must create stable connectivity patterns between neurons. However there remains an insufficient theoretical explanation as to how stable connectivity patterns can form in the presence of synaptic plasticity. Since the dynamics of recurrently connected neurons depend upon their connections, which themselves change in response to the network dynamics, synaptic plasticity and network dynamics have to be treated as a compound system. Due to the nonlinear nature of the system this can be analytically challenging. Utilizing network simulations that model the interplay between the network connectivity and synaptic plasticity can provide valuable insights. However, many existing network models that implement biologically relevant forms of plasticity become unstable. This suggests that current models do not accurately describe the biological networks, which have no difficulty functioning without succumbing to exploding network activity. The instability in these network simulations could originate from the fact that theoretical studies have, almost exclusively, focused on Hebbian plasticity at excitatory synapses. Hebbian plasticity causes connected neurons that are active together to increase the connection strength between them. Biological networks, however, display a large variety of different forms of synaptic plasticity and homeostatic mechanisms, beyond Hebbian plasticity. Furthermore, inhibitory cells can undergo synaptic plasticity as well. These diverse forms of plasticity are active at the same time, and our understanding of the computational role of most of these synaptic dynamics remains elusive. This raises the important question as to whether forms of plasticity that have not been previously considered could -in combination with Hebbian plasticity- lead to stable network dynamics. Here we illustrate that by combining multiple forms of plasticity with distinct roles, a recurrently connected spiking network model self-organizes to distinguish and extract multiple overlapping external stimuli. Moreover we show that the acquired network structures remain stable over hours while plasticity is active. This long-term stability allows the network to function as an associative memory and to correctly classify distorted or partially cued stimuli. During intervals in which no stimulus is shown the network dynamically remembers the last stimulus as selective delay activity. Taken together this work suggest that multiple forms of plasticity and homeostasis on different timescales have to work together to create stable connectivity patterns in neuronal networks which enable them to perform relevant computation
Predictor networks and stop-grads provide implicit variance regularization in BYOL/SimSiam
Self-supervised learning (SSL) learns useful representations from unlabelled
data by training networks to be invariant to pairs of augmented versions of the
same input. Non-contrastive methods avoid collapse either by directly
regularizing the covariance matrix of network outputs or through asymmetric
loss architectures, two seemingly unrelated approaches. Here, by building on
DirectPred, we lay out a theoretical framework that reconciles these two views.
We derive analytical expressions for the representational learning dynamics in
linear networks. By expressing them in the eigenspace of the embedding
covariance matrix, where the solutions decouple, we reveal the mechanism and
conditions that provide implicit variance regularization. These insights allow
us to formulate a new isotropic loss function that equalizes eigenvalue
contribution and renders learning more robust. Finally, we show empirically
that our findings translate to nonlinear networks trained on CIFAR-10 and
STL-10
Surrogate Gradient Learning in Spiking Neural Networks
Spiking neural networks are nature's versatile solution to fault-tolerant and
energy efficient signal processing. To translate these benefits into hardware,
a growing number of neuromorphic spiking neural network processors attempt to
emulate biological neural networks. These developments have created an imminent
need for methods and tools to enable such systems to solve real-world signal
processing problems. Like conventional neural networks, spiking neural networks
can be trained on real, domain specific data. However, their training requires
overcoming a number of challenges linked to their binary and dynamical nature.
This article elucidates step-by-step the problems typically encountered when
training spiking neural networks, and guides the reader through the key
concepts of synaptic plasticity and data-driven learning in the spiking
setting. To that end, it gives an overview of existing approaches and provides
an introduction to surrogate gradient methods, specifically, as a particularly
flexible and efficient method to overcome the aforementioned challenges
Learning Multi-Stability in Plastic Neural Networks
How is the connectivity learned